Cost Sensitive Discretization of
نویسندگان
چکیده
Many algorithms in decision tree learning are not designed to handle numeric valued attributes very well. Therefore, discretization of the continuous feature space has to be carried out. In this article we introduce the concept of cost sensitive discretization as a preprocessing step to induction of a classifier and as an elaboration of the error-based discretization method to obtain an optimal multi-interval splitting for each numeric attribute. A transparant description of the method and steps involved in cost sensitive discretization is given. We also evaluate its performance against two other well known methods, i.e. entropy-based discretization and pure error-based discretization on a real life financial dataset. From the algoritmic point of view, we show that an important deficiency from error-based discretization methods can be solved by introducing costs. From the application point of view, we discovered that using a discretization method is recommended. To conclude, we use ROC-curves to illustrate that under particular conditions cost-based discretization may be optimal.
منابع مشابه
Cost Sensitive Discretization of Numeric Attributes
Many algorithms in decision tree learning have not been designed to handle numerically-valued attributes very well. Therefore, discretization of the continuous feature space has to be carried out. In this article we introduce the concept of cost-sensitive discretization as a preprocessing step to induction of a classifier and as an elaboration of the error-based discretization method to obtain ...
متن کاملTo Express Required CT-Scan Resolution for Porosity and Saturation Calculations in Terms of Average Grain Sizes
Despite advancements in specifying 3D internal microstructure of reservoir rocks, identifying some sensitive phenomenons are still problematic particularly due to image resolution limitation. Discretization study on such CT-scan data always has encountered with such conflicts that the original data do not fully describe the real porous media. As an alternative attractive approach, one can recon...
متن کاملA New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate
Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملA particle swarm optimization algorithm for minimization analysis of cost-sensitive attack graphs
To prevent an exploit, the security analyst must implement a suitable countermeasure. In this paper, we consider cost-sensitive attack graphs (CAGs) for network vulnerability analysis. In these attack graphs, a weight is assigned to each countermeasure to represent the cost of its implementation. There may be multiple countermeasures with different weights for preventing a single exploit. Also,...
متن کامل